Collapse whitespace #12

ralph-schleicher · 2021-09-15T09:30:38Z

I had a problem when parsing an UTF-8 encoded RDF/XML document.

 (wilbur:collapse-whitespace "Grüß Gott")

Signals an error:

 Cannot decode this: (#\LATIN_SMALL_LETTER_U_WITH_DIAERESIS
                      #\LATIN_SMALL_LETTER_SHARP_S #\  #\G #\o
                      #\t #\t)
    [Condition of type SIMPLE-ERROR]

Here is a fix.

… sufficient.

lisp · 2021-09-15T09:43:39Z

the principle issue is that wilbur - in its inherited form, predates any reasonable support for utf encoding. the whitespace predicate appears to reflect this. i am not a user (see dydra.com) and set it up here so that it would not disappear. if you intend to use it, would you consider integrating the capabilities which are now present in lisp implementations? also, your whitespace characterization seems suspect.

arademaker · 2021-09-15T13:48:26Z

Long time ago I faced similar problem. Better solution would be to remove from Wilbur all xml functions and delegate it to an external lib such as https://common-lisp.net/project/cxml/

lisp · 2021-09-15T14:39:02Z

as i noted earlier, my interest in wilbur is curatorial.
the current dependencies are minimal.
preferences are for corrections rather than introducing significant dependencies.

it would be a significant compromise of its artifactual status to introduce a large dependency to replace an aspect which, in its state of incompleteness, expresses a judgement about the role of rdf-xml.
especially when the rdf ecosystem has moved on to numerous encoding alternatives.
while open-source lisp rdf environments have not, it is not clear that replacing that component of wilbur would change that situation.

i am open to being convinced otherwise.

ralph-schleicher · 2021-09-15T17:19:33Z

the principle issue is that wilbur - in its inherited form, predates any reasonable support for utf encoding. the whitespace predicate appears to reflect this.

The character encoding should be handled by the reader, e.g. the web client - I'm using Drakma, which works well for me. Wilbur only has to deal with Lisp characters.

i am not a user (see dydra.com) and set it up here so that it would not disappear. if you intend to use it, would you consider integrating the capabilities which are now present in lisp implementations?

How do you imagine that? Shall I take over maintenance?

also, your whitespace characterization seems suspect.

I don't think so. It's according to the ASCII definition.

lisp · 2021-09-15T18:41:08Z

if the goal is to make this artefact behave better with unicode, then wrt whitespace the class of non-graphic characters does not coincide with that of whitespace characters. https://en.wikipedia.org/wiki/Whitespace_character

lisp · 2021-09-15T18:45:01Z

The character encoding should be handled by the reader, e.g. the web client - I'm using Drakma, which works well for me. Wilbur only has to deal with Lisp characters.

in some sense that is true, but the change removes the primitive support which it had for utf decoding in that one function, which leaves one to wonder where else that situation applies.

lisp · 2021-09-15T18:45:41Z

How do you imagine that? Shall I take over maintenance?

you are certainly free to fork it.

ralph-schleicher added 2 commits September 15, 2021 11:09

xml-util.lisp (whitespace-char-p): Don't use a macro if a function is…

c1cc3ba

… sufficient.

xml-util.lisp (collapse-whitespace): Full Unicode support.

e44f3bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Collapse whitespace #12

Collapse whitespace #12

Uh oh!

ralph-schleicher commented Sep 15, 2021

Uh oh!

lisp commented Sep 15, 2021 via email •

edited

Loading

Uh oh!

arademaker commented Sep 15, 2021

Uh oh!

lisp commented Sep 15, 2021

Uh oh!

ralph-schleicher commented Sep 15, 2021

Uh oh!

lisp commented Sep 15, 2021

Uh oh!

lisp commented Sep 15, 2021

Uh oh!

lisp commented Sep 15, 2021

Uh oh!

Uh oh!

Collapse whitespace #12

Are you sure you want to change the base?

Collapse whitespace #12

Uh oh!

Conversation

ralph-schleicher commented Sep 15, 2021

Uh oh!

lisp commented Sep 15, 2021 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arademaker commented Sep 15, 2021

Uh oh!

lisp commented Sep 15, 2021

Uh oh!

ralph-schleicher commented Sep 15, 2021

Uh oh!

lisp commented Sep 15, 2021

Uh oh!

lisp commented Sep 15, 2021

Uh oh!

lisp commented Sep 15, 2021

Uh oh!

Uh oh!

lisp commented Sep 15, 2021 via email •

edited

Loading